

ABSTRACT

1,021,226. Automatic speech recognition. INTERNATIONAL BUSINESS MACHINES CORPORATION. Nov. 9, 1962 [Nov. 14, 1961], No. 42414/62. Heading G4R.  In a system for recognizing spoken words specific characteristics of speech are identified and there are means for establishing the time sequence of the characteristics in order to provide variable amplitude signals representative of the occurrence of particular spoken words. The characteristics identified are voicing and friction and a suitable circuit is shown in Fig. 2. The input signal, e.g. from a microphone, is applied to the bases of two transistors 20, 21 having their emitters connected to earth. The collectors are connected by resistors, via an R-C smoothing circuit to the input and via A.C. coupling capacitors 23 to a transistor 25. The circuit responds to the asymmetric signals which are produced by a spoken vowel to give a positive or negative signal at output 37 depending upon the vowel spoken. An R-C feedback loop 38 is adjustable to cause the circuit to respond with a positive signal to the vowel of the word &#34; 3 &#34; and with a negative signal to the word &#34; 4.&#34; The transistor 25 amplifies the A.C. component of the input signal. This is differentiated at 26 and the zero-crossing pulses applied via diode 27 to an integrator circuit 29 which develops a positive voltage when friction frequency components are present. The amplified signals are further amplified by transistor 40 and again differentiated (at 42) and applied via diode 43 to integrator 44 which develops a negative voltage when friction is present. The integrators 29 and 44 are connected to opposite ends of a potentiometer the arrangement being such that for strong frictional sounds positive integrator 29 develops a strong charge and negative integrator 44 becomes saturated. The arm 31 of the potentiometer is placed so that the positive change of integrator 29 predominates. For weak friction the positive integrator receives a weak charge and the negative integrator, owing to the extra amplifier 40, receives a much greater charge so that it predominates. A positive output therefore indicates a strong frictional sound (e.g. &#34; S &#34;) and a negative output represents a weak frictional sound (&#34; f &#34;). The output from integrator 44 is also connected to one end of a potentiometer 34 connected also to the vowel output lead. An arm 35 gives an output representing voiced speech. The strong or weak friction signals, the voicing signals and signals distinguishing between particular sounds are combined in relay and diode gating circuits having weighted resistors adapted to give an output current representative of the word recognized. This may be registered on a meter 17, Fig. 5. Relays are used to determine the time-sequence of the voicing and friction signals. The relay 77 is operated by a voicing signal to switch contact 76 and draw current through meter 17 to - 12 V. source. Relays 85 and 86 also operate to switch contacts 91 so that if a friction signal Fw or Fs occurs after the voicing signal the relay Fwl or Fsl is operated, signifying &#34; friction late &#34; in the word. If the friction signal. precedes the voicing signal the other relays Fwe or Fse are operated, signifying &#34; friction early.&#34; Contacts of these relays energize hold coils and draw predetermined currents through the meter so that the total amount of current indicates which of seven words has been recognized. In the circuit of Fig. 6 the input signals are subjected to a phase shift determined by the resistor 93 of R-C circuit 93, 94. A fixed R-C circuit 96, 97 follows and two unidirectional parallel lines 98, 99 and 100, 101 which serve to cut out signals near the base line. Capacitors 103, 104 are charged by the resulting peaks and give a positive or negative response according to the vowel and the adjustment of the circuit. Particular pairs of words, such as &#34; three &#34; and &#34; four &#34; or &#34; two &#34; and &#34; seven &#34; can be distinguished by suitably adjusted circuits. Plosives, e.g. &#34; t &#34; may be detected by the circuit of Fig. 11 in which a pair of networks 170, 171 constitute a low-pass filter (about 10 cycles) and the signal envelope which passes is amplified at 173 and 174. In Fig. 9 there are three circuits 141- 143 which indicate the presence of particular characteristics and circuits 144-146 distinguish between particular words. The signals are all applied to decision circuits 148 together with timing signals from a word time-base circuit 150 and timing control circuit 151. The former responds to voicing and frictional sound to mark the start of a new word. The circuit 151 controls the operations after a word has finished and it is designed to wait a given interval after the last occurrence of voicing or friction sounds. The output of the decision circuits passes to an adder-printer 152. The time-base circuit may again consist of relays adapted to complete different circuits according to their order of energization and the contacts of these relays may form the decision network giving an output on the appropriate one of sixteen leads (digits 0-9 and six control words). The timing circuit 151 has inputs from the three circuits 141-143 applied via diodes to a relay coil. When all three speech components cease the relay falls out and picks up another relay (sample) adapted to transfer the output to the adderprinter solenoids. When this second relay falls out a third relay energizes to give the reset pulse.



